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Abstract — There is a trend of applying machine learning 
algorithms to cognitive radio. One fundamental open problem 
is to determine how and where these algorithms are useful in a 
cognitive radio network. In radar and sensing signal processing, 
the control of degrees of freedom (DOF) — or dimensionality — is 
the first step, called pre-processing. In this paper, the combination 
of dimensionality reduction with SVM is proposed apart from 
only applying SVM for classification in cognitive radio. Measured 
Wi-Fi signals with high signal to noise ratio (SNR) are employed 
to the experiments. The DOF of Wi-Fi signals is extracted 
by dimensionality reduction techniques. Experimental results 
show that with dimensionality reduction, the performance of 
classification is much better with fewer features than that of 
without dimensionality reduction. The error rates of classification 
with only one feature of the proposed algorithm can match the 
error rates of 13 features of the original data. The proposed 
method will be further tested in our cognitive radio network 
testbed. 

Index Terms — Degrees of freedom (DOF), cognitive radio, 
support vector machine (SVM), dimensionality reduction. 

I. Introduction 

Intelligence and learning are key factors to cognitive ra- 
dio. Recently, there is a trend of applying machine learning 
algorithms to cognitive radio (TJ. Machine learning [2] is a 
discipline to design algorithms for computers to imitate human 
being's behaviors, which includes learning to recognize the 
complex patterns and making decisions based on experience 
automatically and intelligently. The topic of machine learning 
is precisely that needs to be introduced to cognitive radio. One 
fundamental open problem is to determine how and where 
these algorithms are useful in a cognitive radio network. Such 
a systematical investigation is missing in the literature. It is 
the motivation of this paper to fill this gap. 

In radar and sensing signal processing, the control of 
degrees of freedom (DOF) — or intrinsic dimensionality — is the 
first step, called pre-processing. The network dimensionality, 
on the other hand, has received attention in information 
theory literature. One naturally wonders how network (signal) 
dimensionality affects the performance of system operation. 
Here we study, as an illustrative example, state classification 
of measured Wi-Fi signal in cognitive radio under this context. 
Both linear such as principal component analysis (PCA) and 
nonlinear methods such as kernel principal component analysis 
(KPCA) and maximum variance unfolding (MVU) are studied, 
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by combining them with support vector machine (SVM) [3|- 
|8| — the latest breakthrough in machine learning. Dimen- 
sionality reduction methods of PCA, KPCA and MVU are 
systematically studied in this paper which can meet the needs 
of all kinds of data owning different structures. The reduced 
dimension data can retain most of the useful information of 
the original data but have much fewer dimensions. 

SVM is both a linear and a nonlinear classifier which has 
been successfully applied in many areas such as handwritten 
digit recognition |9)-fTT| and object recognition p2) . In 
cognitive radio, SVM has been exploited to do channel and 
modulation selection fPJ) , signal classification fl4)-fl6j| and 
spectrum estimation |17| . In fl4) , a method for combining 
feature extraction based on spectral correlation analysis with 
SVM to classify signals has been proposed. In this paper, SVM 
method will be explored as a classifier for measured Wi-Fi 
signal data. 

Dimensionality reduction methods are innovative and im- 
portant tools in machine learning JTSJ. The original dimen- 
sionality data collected in cognitive radio may contain a lot of 
features, however, usually these features are highly correlated 
and redundant with noise. Thus the intrinsic dimensionality 
of the collected data is much fewer than the original features. 
Dimensionality reduction attempts to select or extract a lower 
dimensionality expression but retain most of the useful infor- 
mation. 

PCA p9) is the best-known linear dimensionality reduction 
method. PCA takes the variance among data as the useful 
information — it aims to find a subspace 57 which can max- 
imally retain the variance of the original dataset. On the 
other hand, although linear PCA can work well (when such 
a subspace £1 exists), it always fails to detect the nonlinear 
structure of data. A nonlinear dimensionality reduction method 
called KPCA (20) can be used for this purpose. KPCA 
uses the kernel tricks pT) to map the original data into a 
feature space F, and then does PCA in F without knowing 
the mapping explicitly. The leading eigenvector of the sample 
covariance matrix in the feature space has been explored to 
spectrum sensing in cognitive radio p2) . 

Manifold learning has become a very active topic in non- 
linear dimensionality reduction. The basic assumption for 
these manifold learning algorithms is that the input data lie 
on or close to a smooth manifold (23), p4) . The cognitive 
radio network happens to lie in this category. A lot of 
promising methods have been proposed in this context, includ- 
ing isomeric mapping (Isomap) [25], local linear embedding 
(LLE) J26) , Laplacian eigenmaps |27), local tangent space 
alignment (LTSA) (28), MVU (29), Hessian eigenmaps (30), 
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manifold charting |31|, diffusion maps J32| and Riemannian 
manifold learning (RML) [ 33 1 . The MVU approach will be ap- 
plied to our problem. MVU exploits semidefinite programming 
method — the latest breakthrough in convex optimization — to 
solve a convex optimization model that maximizes the output 
variance, subject to the constrains of zero mean and local 
isometry. 

As aforementioned, the collected data often contains too 
much redundant information. The redundant information not 
only complicates the algorithms but also conceals the under- 
lying reason of the performance of the algorithms. Therefore, 
the intrinsic dimensionality of the collected data should be 
extracted. In this paper, the combination of dimensionality 
reduction with SVM is also proposed apart from only applying 
SVM for classification. The intrinsic structure of the collected 
data in cognitive radio determines the uses of the correspond- 
ing linear or nonlinear dimensionality reduction methods. 

The contributions of this paper are as follows. First, SVM 
is explored to classify the states of Wi-Fi signal successfully. 
Second, the combination of SVM and dimensionality reduc- 
tion is the first time proposed to cognitive radio, with the 
motivation stated above. Third, the measured Wi-Fi signal is 
employed to validate the proposed approach. In this paper, 
both the linear and nonlinear dimensionality reduction method 
will be systematically studied and applied to Wi-Fi signal. 

We are building a cognitive radio network testbed [34) . 
The dimensionality reduction techniques can be tested in the 
network testbed in real time. More applications, such as smart 
grid [34|-[36 |, and wireless tomography ]37| , p8| , can benefit 
from the network testbed and machine learning techniques. 

The organization of this paper is as follows. In section [ITJ 
SVM is briefly reviewed. Three different dimensionality reduc- 



tion methods are introduced in section III The procedure of 
combining dimensionality reduction with SVM is revisited in 
IV Measure Wi-Fi signals are introduced in Section[V] 



section 



The experimental results are shown in section VI Finally, the 
paper is concluded in Section |VII| 



II. Support Vector Machine 

Supervised learning is one of the three major learning 
types in machine learning. The dataset of supervised learning 
consists of M pairs of inputs/outputs (labels) 



1,2, 



,M. 



(1) 



Suppose the training samples and testing samples are Xj with 
indexes i t ,t = 1,2, ••• , T and i s ,s = 1,2, ••• ,S, respec- 
tively. A supervised learning algorithm seeks mapping from 
inputs to outputs by training set and predicts any outputs based 
on the mapping. If the output is categorical or nominal value, 
then this will become a classification problem. Classification 
is a common task in machine learning. SVM ||6j (7J methods 
will be employed in two categories' classification experiments 
later. 

Consider, for example, a simplest two classes classification 
task with the data that are linearly separable. Under this case, 



indicates which class Xi t belongs to. SVM attempts to find 
the separating hyperplane 



w-x + 6 = 



(3) 



with the largest margin (7) satisfying the following con- 
straints: 

. v _i_ h i f^,- ; . — 1 

(4) 



w • Xj t + b > 1 for l it = 1 
w • x,\ + b < — 1 for L\ 
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for linear separable case, in which w is the normal vector of 
the hyperplane and • stands for inner product. The constraints 
Q can be combined into: 



! it (wxi t +6) > 1. 



(5) 



The requirements for separating hyperplane with the largest 
margin can formulate the problem into the following optimiza- 
tion model: 

minimize ||w|| 

subject to (6) 
Z it (w-Xi t +6) > 1 

in which t = 1,2, ■ ■ ■ ,T. The dual form of |6) by introducing 
Lagrange multipliers 



is: 



in which 



a H >0, i = l,2,--- ,T 



maximize 

X/ a H ~ 2 E a H a jt lit ht X it ' x jt 

it it,jt 

subject to 
J2 a tth t = 

it 

ait > °) 



w = J2 a nht*if 



(7) 



(8) 



(9) 



Those x it with a; f > are called support vectors. By 
substituting (|9) into ([3]), the solution of separating hyperplane 
is: 

f(x) = ai t li t Xi t • x + b. (10) 

it 

The brilliance of the ( fT0| ) is that it just relies on the inner 
product between training points and testing point. It allows 
SVM to be easily generalized to nonlinear SVM. If f(x) is 
not a linear function about the data, the nonlinear SVM can 
be obtained by introducing a kernel function : 



k(x it ,x) = cp(-Ki t ) ■ <p(x) 



(11) 



^e{-i,l} 



(2) 



to implicitly map the original data into a higher dimensional 
feature space F, where ip is the mapping from original space 
to feature space. In F, (f(x it ) are linearly separable. The 
separating hyperplane in feature space F is easily generalized 
into the following form: 

f(x) = ^2a it l it ip(x ii; ) ■ ip(x) + b = ^a it Zi t k(x it ,x) + b. 

it it 

(12) 

By introducing the kernel function, the mapping ip need not 
be explicitly known which reduces much of the computational 
complexity. For much more details, refer to |6) j7]. 
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A function is a valid kernel if there exists a mapping tp 
satisfying ( fTT| i. Mercer's condition [8| gives us the condition 
about what kind of functions are valid kernels. Actually, some 
common used kernels are as follows: polynomial kernels 



k(x. i ,x J ) = (x t -Xj + l) d , 
radial basis kernels (RBF) 

k(xi,Xj) = exp(-7||xi -xj|| 2 ), 
and neural network type kernels 

k(xi, Xj) = tanh((xi ■ x,-) + b), 
in which the heavy-tailed RBF kernel is in the form of 

k(Xj,Xj) = exp(-7 ||x? - x"|| b ), 
and Gaussian RBF kernel is 

/ ll Y -- Y .-ll 2 \ 
= exp - 



2a 2 



(13) 



(14) 



(15) 



(16) 



(17) 



III. Dimensionality Reduction 

Dimensionality reduction is a very effective tool in machine 
learning field. 

In the rest of this paper, assuming the original dimension- 



ality data are a set of M samples x^ € R , i = 1, 2, • • • M, 
the reduced dimensionality samples of x^ are 6 R A , i = 
1, 2, • • • M, where K « N . Xij and are componentwise 
elements in x; and y.;, respectively. 



A. Principal Component Analysis 

PC A [ 19 1 is the best-known linear dimensionality reduction 
method. PCA aims to find a subspace Vl which can maximally 
retain the variance of the original dataset. The basis of f2 is 
obtained by eigen-decomposition of covariance matrix. The 
procedure can be summarized into the following four steps. 

1) Compute the covariance matrix of Xj 



1 M 

c = mE( x '<- uKx *- u ) 5 



(18) 



A I 



where u = E x 4 is the mean of the given samples, 

i=l 

T means transpose. 

2) Calculate eigenvalues Ai > A2 > •■■ > A^ and 
the corresponding eigenvectors Vi,V2,--- , Vjv of the 
covariance matrix C. 

3) The basis of f2 is vi, V2, • • • , Vif. 

4) Dimensionality reduction by 



Uij = (Xj - u) • Vj. 
The value of K is determined by the criteria 



K 

E x i 



i=l 



N 

E A, 

i=l 



> threshold. 



(19) 



(20) 



B. Kernel Principal Component Analysis 

PCA works well for the high dimensionality data with 
linear variability, but always fails when nonlinear nature exists. 
KPCA p0| is, on the other hand, designed to extract the 
nonlinear structure of the original data. It uses the kernel 
function k (same as SVM) to implicitly map the original data 
into a feature space F, where (p is the mapping from original 
space to feature space. In F , PCA algorithm can work well. 

If k is valid kernel function, the matrix 



K = (k(x i ,x J ))& =1 



(21) 



must be positive semi-definite. The matrix K is the so-called 
kernel matrix. 

Assuming the mean of feature space data ip(xi),i = 
1, 2, • • • M is zero, i.e., 



1 M 

— Y 

i=l 



^(Xi 



0. 



The covariance matrix in F is 



1 M 

c f = Yi Y, <P(xi)f(xi) T ■ 

i=l 



(22) 



(23) 



In order to apply PCA in F, the eigenvectors vf of Cp are 
needed. As we know that the mapping ip is not explicitly 
known, thus the eigenvectors of Cp can not be as easily 
derived as PCA. However, the eigenvectors vf of Cf must 
lie in the span [20] of tp(x.i),i — 1, 2, • • • M, i.e., 



(24) 



It has been proved that a.i,i = 1, 2, • • • , M are eigenvectors 
of kernel matrix K |20). In which ay are componentwise 
elements of «j. 

Then the procedure of KPCA can be summarized into the 
following six steps: 

1) Choose a kernel function k- 

2) Compute kernel matrix 



Kij = k(xi,Xj). 



(25) 



3) The eigenvalues A^" > A^" > • • • > A^ and the corre- 
sponding eigenvectors ai, a.2, ■ • • , olm are obtained by 
diagonalizing K. 

4) Normalizing by J20I 



1 = Xf(aj ■ olj). 



(26) 



5) The normalized eigenvectors vf,j = 1, 2, • • • , K con- 
stitute the basis of a subspace in F. 

6) The projection of a training point Xj on vj,j — 
1, 2, • • • , K is computed by 



M 



Vij = ( V j > X i) = Y a jnk( X n,*i)- (27) 



71=1 



Usually threshold = 0.95 or 0.90. 



The idea of kernel in KPCA is exactly the same with kernels 
in SVM. All of kernel functions in SVM can be employed in 
KPCA, too. 
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So far the mean of y?(xj), i = 1, 2, • • • M has been assumed 
to be zero. In fact, the zero mean data in the feature space are 



1 M 



(28) 



The kernel matrix for this centering or zero mean data can be 
derived by p0[ 

K = K 1 M K - K1 M + 1mK1 m (29) 

in which (1m)ij '■= 1/M. 

C. Maximum Variance Unfolding 

MVU [29] approach will be applied in our experiments 
among all the manifold learning methods. Resorting to the 
help of optimization toolbox, MVU can learn the inner product 
matrix of y; automatically by maximizing their variance sub- 
ject to the constraints that y s ; are centered and local distances 
of yi are equal to the local distances of Xj. Here the local 
distances represent the distances between y, (x«) and its k 
nearest neighbors, in which k is a parameter. 

The intuitive explanation of this approach is that when an 
object such as string is unfolded optimally, the Euclidean 
distances between its two ends must be maximized. Thus the 
optimization objective function can be written as 

n it 2 



maximize 



' Si 



■yj\ 



(30) 



subject to the constraints, 



^ (31) 

||y< - yjll 2 = ll x i - x il| 2 when 77^ = 1 

in which rjij = 1 means Xj and xj are k nearest neighbors 
otherwise ryy = 0. 

Apply inner product matrix 



(yi-y,-)5= 



(32) 



of y, to the above optimization can make the model simpler. 

The procedure of MVU can be summarized as follows: 
1) Optimization step: because I is an inner product matrix, 
it must be positive semi-definite. Thus the above opti- 
mization can be reformulated into the following form 



[29| 



maximize trace(I) 
subject to 
I>-0 





where D 



12 ij ^ij 



2L 



'■jj 



/.),.. when //, , = 1 



1 j 



|xj — x,-|| , and 1^0 represents I is 
positive semi-definite. 

2) The eigenvalues A^ > Aj > ■ ■ ■ > A^f and the corre- 
sponding eigenvectors , , 
diagonalizing I. 

3) Dimensionality reduction by 



are obtained by 



Vij 



(34) 



in which are componentwise elements of vj. 



Landmark-MVU (LMVU) (39) is a modified version of 
MVU which aims at solving larger scale problems than MVU. 
It works by using the inner product matrix A of randomly 
chosen landmarks from Xj to approximate the full matrix I, 
in which the size of A is much smaller than I. 

Assuming the number of landmarks is m which are 
ai,a2,--- ,a m , respectively. Let Q [39) denote a linear 
transformation between landmarks and original dimensional 
data Xj € R N , i = 1, 2, • • • M, accordingly, 



(35) 



in which 



1 X! ^ 




( ^ \ 


x 2 




a 2 








V X M J 




\ a m J 


X, R 







(36) 



Assuming the reduced dimensionality landmarks of 
ai,a 2 , - - ,a m are yi,y 2 , ' ,y m . and the reduced di- 
mensionality samples of Xi,x 2 , • • • , x M are y i; y 2 , • • • , y M , 
then the linear transformation between yi,y 2 ,--- ,yj\/ and 
Yi,y2, • • • ,fm is Q as well (39), consequently, 



( yi \ 

V yM J 



Q 



(37) 



Matrix A is the inner-product matrix of ai,a 2 , • 
A = ( yi -y,)™ =1 , 
hence the relationship between I and A is 
I ps QAQ T . 



(38) 



(39) 



The optimization of < |33| > can be reformulated into the 
following form: 



maximize trace QAQ 
subject to 
A y 



(33) in which 



E« (QAQ 

D ij < D 



Di 



ii = 
when r/ij = 1 



D y 



{QAQ 1 



2(QAQ 3 



(QAQ J 



(40) 

(41) 
(42) 



and A y represents A is positive semi-definite. This opti- 
mization model differs from ( p3j ) in that equality constraints 
for nearby distances are relaxed to inequality constraints in 
order to guarantee the feasibility of the simplified optimization 
model. 

LMVU can increase the speed of programming but with 
the cost of decreasing accuracy. In this paper's simulation, the 
LMVU will be applied. 
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IV. The Procedure of Using SVM and 
Dimensionality Reduction 

As aforementioned, in radar and sensing signal processing, 
the control of DOF — or dimensionality — is the first step, called 
pre-processing. Both the linear and nonlinear methods will be 
investigated in this paper as pre-processing tools to extract the 
intrinsic dimensionality of the collected data. 

First, SVM will be employed for classification in cognitive 
radio. The classification power of SVM is tested by the testing 
set. The procedure of SVM for classification is summarized 
as follows. 

1) The collected dataset X;,i = 1,2," • , M will be di- 
vided into training sets and testing sets. The training 
samples and testing samples are with indexes it,t = 
1,2,-" , T and i s , s = 1, 2, • •• , S, respectively. 

2) The labels U t and li s for x, t and x^ s are extracted. 

3) Choose a kernel function from ( fT3j ) to (17) , and the 
corresponding parameter's values for the chosen kernel 
are designated. 

4) The separating hyperplane in higher dimensional feature 
space F which is in the form of ( fl2| ) is trained by the 
training set x; t . 

5) The classification performance of the trained hyperplane 
will be tested by the testing set x^ s . 

The above process will be repeated to gain averaged test errors. 

Apart from applying SVM to classification, dimensionality 
reduction will be implemented before SVM to get rid of the 
redundant information of the collected data. The procedure of 
the proposed algorithm that is SVM combined with dimen- 
sionality reduction can be summarized as follows: 

1) The collected dataset X;,i = 1,2," • , M will be di- 
vided into training sets and testing sets. The training 
samples and testing samples are Xj with indexes i t ,t = 
1, 2, •• • , T and i s , s = 1, 2, • •• ,S, respectively. 

2) The labels Z; t and li s for x, t and x^ s are extracted. 

3) Obtain reduced dimension data y; t and yj s by using of 
dimensionality reduction methods. 

4) yi t and y^ s are taken as the new training set and testing 
set. 

5) The labels l it and l is for y it and y is are kept unchanged 
with Xj t and Xj s . 

6) Choose a kernel function from ( fT3j ) to ( |17) , and the 
corresponding parameter's values for the chosen kernel 
are designated. 

7) The separating hyperplane in higher dimensional feature 
space F which is in the form of ( p"2| ) is trained by the 
new training set y it . 

8) The classification performance of the trained hyperplane 
will be tested by the new testing set y.; s . 

The above process will be repeated to gain averaged test 
errors. The flow chart of SVM combined with dimensionality 
reduction methods for classification is shown in Fig. [T] 

Dimensionality reduction methods of PCA, KPCA and 
MVU are systematically studied in this paper which can meet 
the needs of all kinds of data owning different structures. 
Dimensionality reduction with PCA to derive y.- H and y.^ is 
implemented as follows. 



Time domain 




signals 





Fig. 1 . The flow chart of SVM combined with dimensionality reduction for 
classification 



1) The training set x it is input to PCA procedure from step 



1) to step 3) in section III- A to obtain the eigenvectors 
Of Vi, v 2 , 



2) Dimensionality reduction by 



( x i t 

( x i s 



u) • (vi,v 2 , ■ ■ • ,V K ) 
u) ■ (vx, v 2 , • • • ,v K ) 



(43) 
(44) 



Dimensionality reduction with KPCA to derive y^ t and y,; s 
is implemented as follows. 

1) The training set x it is input to KPCA procedure from 



step 1) to step 5) in section III-B to obtain eigenvectors 
vf,j = 1,2, •• • , K, implicitly. 
2) Dimensionality reduction by 



T 



T 

(J2 aink(x„,x. i( ), 

n=l 
T 

■ , J2 aifnk(x„,x lt )), 

n=l 



(45) 



T 

(J2 aink(x„,x ls ), 

n=l 
T 

■ , J2 aifnk(x„,x ls )). 

n=l 



(46) 



Dimensionality reduction with LMVU to derive y it and y is 
is implemented as follows. 

1) Both the training set and testing set Xj t and x^ s are input 
to MVU procedure from step 1) to step 2) in section 



III-C to obtain eigenvalues > \\ > ■ ■ ■ > and 

y 



the corresponding eigenvectors , , • • • , v y M 
2) Dimensionality reduction by 



v T - (J\ y v y 

■tit W 1 i* 1 ' 



(47) 



(48) 



To use LMVU, it is necessary to substitute ( |33) l by ( |40[ > in the 
MVU procedure. 
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V. Wi-Fi Signal Measurement 

Wi-Fi time-domain signals have been measured and 
recorded using an advanced DPO whose model is Tektronix 
DPO72004 (40). The DPO supports a maximum bandwidth 
of 20 GHz and a maximum sampling rate of 50 GS/s. It is 
capable to record up to 250 M samples per channel. In the 
measurement, a laptop accesses the Internet through a wireless 
Wi-Fi router, as shown in Fig. [2] An antenna with a frequency 
range of 800 MHz to 2500 MHz is placed near the laptop and 
connected to the DPO. The sampling rate of the DPO is set to 
6.25 GS/s. Recorded time-domain Wi-Fi signals are shown in 
Fig. [3] The duration of the recorded Wi-Fi signals is 40 ms. 

The recorded 40-ms Wi-Fi signals are divided into 8000 
slots, with each slot lasting 5 /is. The time-domain Wi-Fi 
signals within the first 1 /is of every slot are then transformed 
into frequency domain using fast Fourier transform (FFT). 
The spectral states of the measured Wi-Fi signal at each time 
slot present two possibilities. One is that current spectrum is 
occupied (state is busy U — 1) at this time slot or current 
spectrum is unoccupied (state is idle U = 0). In this paper, 
the frequency band of 2.411 - 2.433 GHz is considered. The 
resolution in frequency domain is 1 MHz. Thus, for each slot, 
23 points in frequency domain can be obtained. The total 
obtained data are shown in Fig. [4] 

VI. Experimental Validation 

The spectral states of all time slots for measured Wi-Fi 
signal can be divided into two classes (busy — 1 or idle 
li = 0). The powerful classification technique in machine 
learning, SVM, can be employed to classify the data at each 



Fig. 5. False alarm rate. 

time slot. The processed Wi-Fi data is shown in Fig. [4] The 
data used below are from the 1101 th time slot to the 8000 th 
time slot. 

In the next experiment, amplitude values in frequency 
domain of i th usable time plot are taken as x^. The experiment 
data is taken by 



/ xx ^ 




#X,12-m, ■ 




•) #1, 12+71 






%%\1—mi ■ 


■I #2,12; ■• 


■) ^2,12+n 


\ Xtot ) 




£tot,12-m, ■■ 


) £tot,12j 


••) ^tot,12+7i 



where Xij represents amplitude value on the j frequency 
point of the i th time slot. The dimension of x, is TV = n + 
m + 1, in which < n — m < 1. 

First, the true state li of x^ is obtained. The false alarm 
rate, miss detection rate and total error rate of experiments 
are shown in Fig. [5] Fig. [6] and Fig. [7] respectively, in which 
total error rates are the total number of miss detection and 
false alarm samples divided by the total number of testing set. 
The results shown are the corresponding averaged values of 
50 experiments. In each experiment, the number of training 
set is 200 and the number of testing set is 1800. 

In these results, the steps of the "SVM" method for the j th 
experiment are: 

1) Training set and testing set are chosen denoted by x^ t 
and Xj s . 

2) X; t and Xj s are taken by ( |49] i with dimensionality N = 
1, 2, 3, 13, in which < n - m < 1. 
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Fig. 6. Miss detection rate. 
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Fig. 7. Total error rate. 

3) For each dimension N, Xf t and Xi s are fed to SVM 
algorithm. 

The above process repeats with j = 1,2,..., 50, then the 
corresponding averaged values of these 50 experiments are 
derived for each dimension. 

However, for the methods of "PCA with SVM", "KPCA 
with SVM" and "LMVU with SVM", their steps for the j th 
experiment are: 

1) Training set and testing set are chosen denoted by x, t 
and Xj s . 

2) Xj 4 and x^ s are taken by |49) with dimensionality N 
1 .'!. in which n = m = 6. 

3) yi t and y^ s are reduced dimensional samples of Xj t and 
x is with PCA, KPCA and LMVU methods, respectively. 

4) The dimensions of y^ t and yi s vary by K = 
1, 2, 3, 13 manually. 

5) For each dimension K, y, t and yj s are fed to SVM 
algorithm. 

The above process repeats with j = 1,2,..., 50, then the 
corresponding averaged values of these 50 experiments are 
derived for each dimension. 

In fact, for LMVU approach, both the placements and 
the number of landmarks can influence its performance. The 
choice of landmarks for each experiment is as follows. For 
every experiment, the number of landmarks m is equal to 
20. At the beginning of the LMVU process, ten groups of 
randomly chosen positions in Xj (including both training and 
testing sets) are obtained with 20 positions in each group, and 
then these positions are fixed. For each experiment, results 



Fig. 8. Distributions of corresponding eigenvalues. 

with landmark positions assigned by each group are obtained. 
Given ten groups of landmarks, the group of which that can 
get minimal total error rate is taken as the landmarks for this 
experiment. 

In this whole experiment, Gaussian RBF kernel with 2a 2 = 
5.5 2 is used for KPCA. The parameter k = 3, in which k is 
the number of nearest neighbors of y^ (Xj) (including both 
training and testing sets) for LMVU. The optimization toolbox 
SeDuMi 1.1R3 HTJ is applied to solve the optimization step 
in LMVU. The SVM toolbox SVM-KM (42| is used to train 
and test SVM processes. The kernels selected for SVM are 
heavy-tailed RBF kernels with parameters 7=l,o = l,6 = l. 
These parameters keep unchanged for the whole experiment. 

Take data set of the first experiment as an example. The 
distributions of corresponding normalized (the summation of 
total eigenvalues equals one) eigenvalues of PCA, KPCA and 
LMVU methods are shown in Fig. [8] It can be seen that the 
largest eigenvalues for the three methods are all dominant 
(84%, 73%, 97% for PCA, KPCA and LMVU, respectively). 
Consequently, reduced one-dimensional data can even extract 
most of the useful information from the original data. 

As can be seen from Fig. [5] Fig. [6] and Fig. [7J with the only 
use of SVM, the classification errors are very low. It means 
SVM successfully classifies the states of the Wi-Fi signal. The 
results also verify the fact that more and more dimensions 
of the data are included (more information are included), the 
error rates should become smaller and smaller globally. On 
the other hand, the results with y, fed to SVM can gain a 
higher accuracy at lower dimensionality than with x< directly 
fed to SVM. By dimensionality reduction, most of the useful 
information including in Xj can be extracted to the first K = 1 
dimension of y ;. Therefore, even if we increase the dimensions 
of the reduced data, the error rates do not embody obvious 
improvement. The error rates with only one feature of the 
proposed algorithm can match the error rates of 13 features 
of the original data. 

VII. Conclusion 

One fundamental open problem is to determine how and 
where machine learning algorithms are useful in a cognitive 
radio network. The network dimensionality has received at- 
tention in information theory literature. One naturally wonders 
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how network (signal) dimensionality affects the performance 
of system operation. Here we study, as an illustrative example, 
spectral states classification under this context. Both linear 
(PCA) and nonlinear methods (KPCA and LMVU) are studied, 
by combining them with SVM. 

Experimental results show that data with only one feature 
fed to SVM, the false alarm rate of method with dimensional- 
ity reduction is at worst 0.03068% comparing with 0.06581% 
of method without dimensionality reduction, and the miss 
detection rate is 0.9883% comparing with 4.255%. The error 
rates with only one feature of the methods with dimensionality 
reduction can nearly match the error rates of 13 features of 
the original data. 

The results of only appling SVM verify the fact that more 
and more dimensions of the data are included, the error 
rates should become smaller and smaller globally since more 
information of the original data are considered. However, SVM 
combined with dimensionality reduction does not embody 
such property. This is because that the reduced dimension 
data with only one dimension already extracts most of the 
information of the original Wi-Fi signal. Therefore, even if we 
increase the dimensions of the reduced data, the error rates do 
not embody obvious improvement. 

In this paper, SNR of the measured Wi-Fi signal is high 
which makes the error rates of the classification all very 
low. In fact, dimensionality reduction can not only get rid 
of the redundant information but also have the effect of de- 
noising. When the collected data contains more noise, SVM 
combined with dimensionality reduction should embody more 
advantages. Besides, the original dimension of the spectral 
domain Wi-Fi signal is not high which also makes the advan- 
tage of dimensionality reduction not obvious. In the future, 
dimensionality reduction used as pre-processing tool will be 
applied to more scenarios in cognitive radio such as low 
SNR (spectrum sensing) and very high original dimensions 
( cognitive radio network) contexts. 

In the MVU approach, LMVU method is used in this 
paper which decreases the accuracy. On the other hand, the 
optimization toolbox ScDuMi 1.1R3 is used to solve the 
optimization step which slows down the computation. The 
dedicated optimization algorithm for MVU should be proposed 
in the future. 

In the future, more machine learning algorithms will be, 
systematically, explored and applied to the cognitive radio 
network under different scenarios. 
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